Search | WHO COVID-19 Research Database

Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data (preprint)

Tingyi Wanyan; Jing Zhang; Ying Ding; Ariful Azad; Zhangyang Wang; Benjamin S Glicksberg.

arxiv; 2021.

Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2104.02932v1

ABSTRACT

Electronic Health Record (EHR) data has been of tremendous utility in Artificial Intelligence (AI) for healthcare such as predicting future clinical events. These tasks, however, often come with many challenges when using classical machine learning models due to a myriad of factors including class imbalance and data heterogeneity (i.e., the complex intra-class variances). To address some of these research gaps, this paper leverages the exciting contrastive learning framework and proposes a novel contrastive regularized clinical classification model. The contrastive loss is found to substantially augment EHR-based prediction: it effectively characterizes the similar/dissimilar patterns (by its "push-and-pull" form), meanwhile mitigating the highly skewed class distribution by learning more balanced feature spaces (as also echoed by recent findings). In particular, when naively exporting the contrastive learning to the EHR data, one hurdle is in generating positive samples, since EHR data is not as amendable to data augmentation as image data. To this end, we have introduced two unique positive sampling strategies specifically tailored for EHR data: a feature-based positive sampling that exploits the feature space neighborhood structure to reinforce the feature learning; and an attribute-based positive sampling that incorporates pre-generated patient similarity metrics to define the sample proximity. Both sampling approaches are designed with an awareness of unique high intra-class variance in EHR data. Our overall framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data with a total of 5,712 patients admitted to a large, urban health system. Specifically, our method reaches a high AUROC prediction score of 0.959, which outperforms other baselines and alternatives: cross-entropy(0.873) and focal loss(0.931).

Subject(s)

COVID-19

Contrastive Learning Improves Critical Event Prediction in COVID-19 Patients (preprint)

Tingyi Wanyan; Hossein Honarvar; Suraj K. Jaladanki; Chengxi Zang; Nidhi Naik; Sulaiman Somani; Jessica K. De Freitas; Ishan Paranjpe; Akhil Vaid; Riccardo Miotto; Girish N. Nadkarni; Marinka Zitnik; ArifulAzad; Fei Wang; Ying Ding; Benjamin S. Glicksberg.

arxiv; 2021.

Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2101.04013v1

ABSTRACT

Machine Learning (ML) models typically require large-scale, balanced training data to be robust, generalizable, and effective in the context of healthcare. This has been a major issue for developing ML models for the coronavirus-disease 2019 (COVID-19) pandemic where data is highly imbalanced, particularly within electronic health records (EHR) research. Conventional approaches in ML use cross-entropy loss (CEL) that often suffers from poor margin classification. For the first time, we show that contrastive loss (CL) improves the performance of CEL especially for imbalanced EHR data and the related COVID-19 analyses. This study has been approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai. We use EHR data from five hospitals within the Mount Sinai Health System (MSHS) to predict mortality, intubation, and intensive care unit (ICU) transfer in hospitalized COVID-19 patients over 24 and 48 hour time windows. We train two sequential architectures (RNN and RETAIN) using two loss functions (CEL and CL). Models are tested on full sample data set which contain all available data and restricted data set to emulate higher class imbalance.CL models consistently outperform CEL models with the restricted data set on these tasks with differences ranging from 0.04 to 0.15 for AUPRC and 0.05 to 0.1 for AUROC. For the restricted sample, only the CL model maintains proper clustering and is able to identify important features, such as pulse oximetry. CL outperforms CEL in instances of severe class imbalance, on three EHR outcomes with respect to three performance metrics: predictive power, clustering, and feature importance. We believe that the developed CL framework can be expanded and used for EHR ML work in general.

Subject(s)

COVID-19 , Coronavirus Infections , Extravasation of Diagnostic and Therapeutic Materials

Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients Hospitalized with COVID-19 (preprint)

Akhil Vaid; Suraj K Jaladanki; Jie Xu; Shelly Teng; Arvind Kumar; Samuel Lee; Sulaiman Somani; Ishan Paranjpe; Jessica K De Freitas; Tingyi Wanyan; Kipp W Johnson; Mesude Bicak; Eyal Klang; Young Joon Kwon; Anthony Costa; Shan Zhao; Riccardo Miotto; Alexander W Charney; Erwin Böttinger; Zahi A Fayad; Girish N Nadkarni; Fei Wang; Benjamin S Glicksberg.

medrxiv; 2020.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.08.11.20172809

ABSTRACT

Machine learning (ML) models require large datasets which may be siloed across different healthcare institutions. Using federated learning, a ML technique that avoids locally aggregating raw clinical data across multiple institutions, we predict mortality within seven days in hospitalized COVID-19 patients. Patient data was collected from Electronic Health Records (EHRs) from five hospitals within the Mount Sinai Health System (MSHS). Logistic Regression with L1 regularization (LASSO) and Multilayer Perceptron (MLP) models were trained using local data at each site, a pooled model with combined data from all five sites, and a federated model that only shared parameters with a central aggregator. Both the federated LASSO and federated MLP models performed better than their local model counterparts at four hospitals. The federated MLP model also outperformed the federated LASSO model at all hospitals. Federated learning shows promise in COVID-19 EHR data to develop robust predictive models without compromising patient privacy.

Subject(s)

COVID-19

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL